NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Programming Model for GPU Load Balancing

https://doi.org/10.1145/3572848.3577434

Osama, Muhammad; Porumbescu, Serban D.; Owens, John D. (February 2023, Proceedings of the 28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming)

We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior to our work, the only way to unleash the GPU’s potential on irregular problems has been to workload- balance through application-specific, tightly coupled load- balancing techniques. With our open-source framework for load-balancing, we hope to improve programmers’ productivity when developing irregular-parallel algorithms on the GPU, and also improve the overall performance characteristics for such applications by allowing a quick path to experimentation with a variety of existing load-balancing techniques. Consequently, we also hope that by separating the concerns of load-balancing from work processing within our abstraction, managing and extending existing code to future architectures becomes easier.
more » « less
Full Text Available
Analyzing and Implementing GPU Hash Tables

https://doi.org/10.1137/1.9781611977578.ch3

Awad, Muhammad A.; Ashkiani, Saman; Porumbescu, Serban D.; Farach-Colton, Martín; Owens, John D. (January 2023, SIAM Symposium on Algorithmic Principles of Computer Systems)

We revisit the problem of building static hash tables on the GPU and present an efficient implementation of bucketed hash tables. By decoupling the probing scheme from the hash table in-memory representation, we offer an implementation where the number of probes and the bucket size are the only factors limiting performance. Our analysis sweeps through the hash table parameter space for two probing schemes: cuckoo and iceberg hashing. We show that a bucketed cuckoo hash table (BCHT) that uses three hash functions outperforms alternative methods that use iceberg hashing and a cuckoo hash table that uses a bucket size of one. At load factors as high as 0.99, BCHT enjoys an average probe count of 1.43 during insertion. Using three hash functions only, positive and negative queries require at most 1.39 and 2.8 average probes per key, respectively.
more » « less
Full Text Available
Atos: A Task-Parallel GPU Scheduler for Graph Analytics

https://doi.org/10.1145/3545008.3545056

Chen, Yuxin; Brock, Benjamin; Porumbescu, Serban; Buluc, Aydin; Yelick, Katherine; Owens, John (August 2022, Proceedings of the 51st International Conference on Parallel Processing)

We present Atos, a task-parallel GPU dynamic scheduling framework that is especially suited to dynamic irregular applications. Compared to the dominant Bulk Synchronous Parallel (BSP) frameworks, Atos exposes additional concurrency by supporting task-parallel formulations of applications with relaxed dependencies, achieving higher GPU utilization, which is particularly significant for problems with concurrency bottlenecks. Atos also offers implicit task-parallel load balancing in addition to data-parallel load balancing, providing users the flexibility to balance between them to achieve optimal performance. Finally, Atos allows users to adapt to different use cases by controlling the kernel strategy and task-parallel granularity. We demonstrate that each of these controls is important in practice. We evaluate and analyze the performance of Atos vs. BSP on three applications: breadth-first search, PageRank, and graph coloring. Atos implementations achieve geomean speedups of 3.44x, 2.1x, and 2.77x and peak speedups of 12.8x, 3.2x, and 9.08x across three case studies, compared to a state-of-the-art BSP GPU implementation. Beyond simply quantifying the speedup, we extensively analyze the reasons behind each speedup. This deeper understanding allows us to derive general guidelines for how to select the optimal Atos configuration for different applications. Finally, our analysis provides insights for future dynamic scheduling framework designs.
more » « less
Full Text Available
Atos: A Task-Parallel GPU Scheduler for Graph Analytics

Chen, Yuxin; Brock, Benjamin; Porumbescu, Serban; Buluç, Aydın; Yelick, Katherine; Owens, John D. (August 2022, Proceedings of the International Conference on Parallel Processing)

Full Text Available
Essentials of Parallel Graph Analytics

https://doi.org/10.1109/IPDPSW55747.2022.00061

Osama, Muhammad; Porumbescu, Serban D.; Owens, John D. (May 2022, Proceedings of the Workshop on Graphs, Architectures, Programming, and Learning)

We identify the graph data structure, frontiers, operators, an iterative loop structure, and convergence conditions as essential components of graph analytics systems based on the native-graph approach. Using these essential components, we propose an abstraction that captures all the significant programming models within graph analytics, such as bulk-synchronous, asynchronous, shared-memory, message-passing, and push vs. pull traversals. Finally, we demonstrate the power of our abstraction with an elegant modern C++ implementation of single-source shortest path and its required components.
more » « less
Full Text Available
Dynamic Graphs on the GPU

https://doi.org/10.1109/IPDPS47924.2020.00081

Awad, Muhammad A; Ashkiani, Saman; Porumbescu, Serban D.; Owens, John D. (May 2020, Proceedings of the 34th IEEE International Parallel and Distributed Processing Symposium)

We present a fast dynamic graph data structure for the GPU. Our dynamic graph structure uses one hash table per vertex to store adjacency lists and achieves 3.4–14.8x faster insertion rates over the state of the art across a diverse set of large datasets, as well as deletion speedups up to 7.8x. The data structure supports queries and dynamic updates through both edge and vertex insertion and deletion. In addition, we define a comprehensive evaluation strategy based on operations, workloads, and applications that we believe better characterize and evaluate dynamic graph data structures.
more » « less
Full Text Available

Search for: All records